79 research outputs found

    Exploration de la dynamique humaine basée sur des données massives de réseaux sociaux de géolocalisation : analyse et applications

    Get PDF
    Human dynamics is an essential aspect of human centric computing. As a transdisciplinary research field, it focuses on understanding the underlying patterns, relationships, and changes of human behavior. By exploring human dynamics, we can understand not only individual’s behavior, such as a presence at a specific place, but also collective behaviors, such as social movement. Understanding human dynamics can thus enable various applications, such as personalized location based services. However, before the availability of ubiquitous smart devices (e.g., smartphones), it is practically hard to collect large-scale human behavior data. With the ubiquity of GPS-equipped smart phones, location based social media has gained increasing popularity in recent years, making large-scale user activity data become attainable. Via location based social media, users can share their activities as real-time presences at Points of Interests (POIs), such as a restaurant or a bar, within their social circles. Such data brings an unprecedented opportunity to study human dynamics. In this dissertation, based on large-scale location centric social media data, we study human dynamics from both individual and collective perspectives. From individual perspective, we study user preference on POIs with different granularities and its applications in personalized location based services, as well as the spatial-temporal regularity of user activities. From collective perspective, we explore the global scale collective activity patterns with both country and city granularities, and also identify their correlations with diverse human culturesLa dynamique humaine est un sujet essentiel de l'informatique centrée sur l’homme. Elle se concentre sur la compréhension des régularités sous-jacentes, des relations, et des changements dans les comportements humains. En analysant la dynamique humaine, nous pouvons comprendre non seulement des comportements individuels, tels que la présence d’une personne à un endroit précis, mais aussi des comportements collectifs, comme les mouvements sociaux. L’exploration de la dynamique humaine permet ainsi diverses applications, entre autres celles des services géo-dépendants personnalisés dans des scénarios de ville intelligente. Avec l'omniprésence des smartphones équipés de GPS, les réseaux sociaux de géolocalisation ont acquis une popularité croissante au cours des dernières années, ce qui rend les données de comportements des utilisateurs disponibles à grande échelle. Sur les dits réseaux sociaux de géolocalisation, les utilisateurs peuvent partager leurs activités en temps réel avec par l'enregistrement de leur présence à des points d'intérêt (POIs), tels qu’un restaurant. Ces données d'activité contiennent des informations massives sur la dynamique humaine. Dans cette thèse, nous explorons la dynamique humaine basée sur les données massives des réseaux sociaux de géolocalisation. Concrètement, du point de vue individuel, nous étudions la préférence de l'utilisateur quant aux POIs avec des granularités différentes et ses applications, ainsi que la régularité spatio-temporelle des activités des utilisateurs. Du point de vue collectif, nous explorons la forme d'activité collective avec les granularités de pays et ville, ainsi qu’en corrélation avec les cultures globale

    PrivCheck: Privacy-Preserving Check-in Data Publishing for Personalized Location Based Services

    Get PDF
    International audienceWith the widespread adoption of smartphones, we have observed an increasing popularity of Location-Based Services (LBSs) in the past decade. To improve user experience, LBSs often provide personalized recommendations to users by mining their activity (i.e., check-in) data from location-based social networks. However, releasing user check-in data makes users vulnerable to inference attacks, as private data (e.g., gender) can often be inferred from the users'check-in data. In this paper, we propose PrivCheck, a customizable and continuous privacy-preserving check-in data publishing framework providing users with continuous privacy protection against inference attacks. The key idea of PrivCheck is to obfuscate user check-in data such that the privacy leakage of user-specified private data is minimized under a given data distortion budget, which ensures the utility of the obfuscated data to empower personalized LBSs. Since users often give LBS providers access to both their historical check-in data and future check-in streams, we develop two data obfuscation methods for historical and online check-in publishing, respectively. An empirical evaluation on two real-world datasets shows that our framework can efficiently provide effective and continuous protection of user-specified private data, while still preserving the utility of the obfuscated data for personalized LBS

    Histosketch: fast similarity-preserving sketching of streaming histograms with concept drift

    Get PDF
    Histogram-based similarity has been widely adopted in many machine learning tasks. However, measuring histogram similarity is a challenging task for streaming data, where the elements of a histogram are observed in a streaming manner. First, the ever-growing cardinality of histogram elements makes any similarity computation inefficient. Second, the concept-drift issue in the data streams also impairs the accurate assessment of the similarity. In this paper, we propose to overcome the above challenges with HistoSketch, a fast similarity-preserving sketching method for streaming histograms with concept drift. Specifically, HistoSketch is designed to incrementally maintain a set of compact and fixed-size sketches of streaming histograms to approximate similarity between the histograms, with the special consideration of gradually forgetting the outdated histogram elements. We evaluate HistoSketch on multiple classification tasks using both synthetic and real-world datasets. The results show that our method is able to efficiently approximate similarity for streaming histograms and quickly adapt to concept drift. Compared to full streaming histograms gradually forgetting the outdated histogram elements, HistoSketch is able to dramatically reduce the classification time (with a 7500x speedup) with only a modest loss in accuracy (about 3.5%)

    Geographic differential privacy for mobile crowd coverage maximization

    Get PDF
    For real-world mobile applications such as location-based advertising and spatial crowdsourcing, a key to success is targeting mobile users that can maximally cover certain locations in a future period. To find an optimal group of users, existing methods often require information about users' mobility history, which may cause privacy breaches. In this paper, we propose a method to maximize mobile crowd's future location coverage under a guaranteed location privacy protection scheme. In our approach, users only need to upload one of their frequently visited locations, and more importantly, the uploaded location is obfuscated using a geographic differential privacy policy. We propose both analytic and practical solutions to this problem. Experiments on real user mobility datasets show that our method significantly outperforms the state-of-the-art geographic differential privacy methods by achieving a higher coverage under the same level of privacy protection

    Engineering a Simplified 0-Bit Consistent Weighted Sampling

    Full text link
    The Min-Hashing approach to sketching has become an important tool in data analysis, information retrial, and classification. To apply it to real-valued datasets, the ICWS algorithm has become a seminal approach that is widely used, and provides state-of-the-art performance for this problem space. However, ICWS suffers a computational burden as the sketch size K increases. We develop a new Simplified approach to the ICWS algorithm, that enables us to obtain over 20x speedups compared to the standard algorithm. The veracity of our approach is demonstrated empirically on multiple datasets and scenarios, showing that our new Simplified CWS obtains the same quality of results while being an order of magnitude faster

    CrimeTelescope: crime hotspot prediction based on urban and social media data fusion

    Get PDF
    Crime is a complex social issue impacting a considerable number of individuals within a society. Preventing and reducing crime is a top priority in many countries. Given limited policing and crime reduction resources, it is often crucial to identify effective strategies to deploy the available resources. Towards this goal, crime hotspot prediction has previously been suggested. Crime hotspot prediction leverages past data in order to identify geographical areas susceptible of hosting crimes in the future. However, most of the existing techniques in crime hotspot prediction solely use historical crime records to identify crime hotspots, while ignoring the predictive power of other data such as urban or social media data. In this paper, we propose CrimeTelescope, a platform that predicts and visualizes crime hotspots based on a fusion of different data types. Our platform continuously collects crime data as well as urban and social media data on the Web. It then extracts key features from the collected data based on both statistical and linguistic analysis. Finally, it identifies crime hotspots by leveraging the extracted features, and offers visualizations of the hotspots on an interactive map. Based on real-world data collected from New York City, we show that combining different types of data can effectively improve the crime hotspot prediction accuracy (by up to 5.2%), compared to classical approaches based on historical crime records only. In addition, we demonstrate the usability of our platform through a System Usability Scale (SUS) survey on a full prototype of CrimeTelescope

    Location privacy-preserving task allocation for mobile crowdsensing with differential geo-obfuscation

    Get PDF
    In traditional mobile crowdsensing applications, organizers need participants' precise locations for optimal task allocation, e.g., minimizing selected workers' travel distance to task locations. However, the exposure of their locations raises privacy concerns. Especially for those who are not eventually selected for any task, their location privacy is sacrificed in vain. Hence, in this paper, we propose a location privacy-preserving task allocation framework with geo-obfuscation to protect users' locations during task assignments. Specifically, we make participants obfuscate their reported locations under the guarantee of differential privacy, which can provide privacy protection regardless of adversaries' prior knowledge and without the involvement of any third- part entity. In order to achieve optimal task allocation with such differential geo- obfuscation, we formulate a mixed-integer non-linear programming problem to minimize the expected travel distance of the selected workers under the constraint of differential privacy. Evaluation results on both simulation and real-world user mobility traces show the effectiveness of our proposed framework. Particularly, our framework outperforms Laplace obfuscation, a state-of-the-art differential geo-obfuscation mechanism, by achieving 45% less average travel distance on the real-world data

    Privacy-preserving social media data publishing for personalized ranking-based recommendation

    Get PDF
    Personalized recommendation is crucial to help users find pertinent information. It often relies on a large collection of user data, in particular users' online activity (e.g., tagging/rating/checking-in) on social media, to mine user preference. However, releasing such user activity data makes users vulnerable to inference attacks, as private data (e.g., gender) can often be inferred from the users' activity data. In this paper, we proposed PrivRank, a customizable and continuous privacy-preserving social media data publishing framework protecting users against inference attacks while enabling personalized ranking-based recommendations. Its key idea is to continuously obfuscate user activity data such that the privacy leakage of user- specified private data is minimized under a given data distortion budget, which bounds the ranking loss incurred from the data obfuscation process in order to preserve the utility of the data for enabling recommendations. An empirical evaluation on both synthetic and real-world datasets shows that our framework can efficiently provide effective and continuous protection of user-specified private data, while still preserving the utility of the obfuscated data for personalized ranking-based recommendation. Compared to state-of-the-art approaches, PrivRank achieves both a better privacy protection and a higher utility in all the ranking-based recommendation use cases we tested

    Knowledge graph embeddings

    Get PDF
    With the growing popularity of multi-relational data on the Web, knowledge graphs (KGs) have become a key data source in various application domains, such as Web search, question answering, and natural language understanding. In a typical KG such as Freebase (Bollacker et al. 2008) or Google’s Knowledge Graph (Google 2014), entities are connected via relations. For example, Bern is capital of Switzerland. Formally, a popular approach to represent such relational data is to use the Resource Description Framework. It defines a fact as a triple (subject, predicate, and object), which is also known as head, relation, and tail or (h, r, t) for short. Following the above example, the head, relation, and tail..
    • …
    corecore